Picture for Siyi Wang

Siyi Wang

Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs

Add code
May 27, 2026
Viaarxiv icon

Why Can't They Remember? Uncovering Representation and Retrieval Bottlenecks in Multi-Turn Acoustic Memory

Add code
May 26, 2026
Viaarxiv icon

Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems

Add code
May 24, 2026
Viaarxiv icon

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Add code
Apr 03, 2026
Viaarxiv icon

CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering

Add code
Feb 03, 2026
Viaarxiv icon

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models

Add code
Jan 21, 2026
Viaarxiv icon

Risk-Averse Learning with Varying Risk Levels

Add code
Dec 28, 2025
Viaarxiv icon

Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition

Add code
May 24, 2025
Viaarxiv icon

Depth-Based Local Center Clustering: A Framework for Handling Different Clustering Scenarios

Add code
May 14, 2025
Viaarxiv icon

An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage

Add code
Jan 03, 2025
Figure 1 for An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
Figure 2 for An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
Figure 3 for An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
Figure 4 for An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
Viaarxiv icon